Search CORE

61 research outputs found

Building Efficient Query Engines in a High-Level Language

Author: Chafi Hassan
Klonatos Ioannis
Koch Christoph
Rompf Tiark
Publication venue
Publication date: 06/05/2014
Field of study

In this paper we advocate that it is time for a radical rethinking of database systems design. Developers should be able to leverage high-level programming languages without having to pay a price in efficiency. To realize our vision of abstraction without regret, we present LegoBase, a query engine written in the high-level programming language Scala. The key technique to regain efficiency is to apply generative programming: the Scala code that constitutes the query engine, despite its high-level appearance, is actually a program generator that emits specialized, low-level C code. We show how the combination of high-level and generative programming allows to easily implement a wide spectrum of optimizations that are difficult to achieve with existing low-level query compilers, and how it can continuously optimize the query engine. We evaluate our approach with the TPC-H benchmark and show that: (a) with all optimizations enabled, our architecture significantly outperforms a commercial in-memory database system as well as an existing query compiler, (b) these performance improvements require programming just a few hundred lines of high-level code instead of complicated low-level code that is required by existing query compilers and, finally, that (c) the compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for efficiently compiling query engines

Infoscience - École polytechnique fédérale de Lausanne

CSR++: A Fast, Scalable, Update-Friendly Graph Data Structure

Author: Chafi Hassan
Chiadmi Dalila
Firmli Soukaina
Hong Sungpack
Lozi Jean-Pierre
Psaroudakis Iraklis
Trigonakis Vasileios
Weld Alexander
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th International Conference on Principles of Distributed Systems (OPODIS 2020)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

Properties of Healthcare Teaming Networks as a Function of Network Construction Algorithms

Author: Alexander Rosenberg
Caroline M Quill
Christopher Fucile
Gourab Ghoshal
Hassan Chafi
Hugo Serrano Barbosa
Kristen Bush
Martin S Zand
Melissa Trayhan
Robert J White
Samir A Farooq
Timothy Boudreau
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/10/2016
Field of study

Network models of healthcare systems can be used to examine how providers collaborate, communicate, refer patients to each other. Most healthcare service network models have been constructed from patient claims data, using billing claims to link patients with providers. The data sets can be quite large, making standard methods for network construction computationally challenging and thus requiring the use of alternate construction algorithms. While these alternate methods have seen increasing use in generating healthcare networks, there is little to no literature comparing the differences in the structural properties of the generated networks. To address this issue, we compared the properties of healthcare networks constructed using different algorithms and the 2013 Medicare Part B outpatient claims data. Three different algorithms were compared: binning, sliding frame, and trace-route. Unipartite networks linking either providers or healthcare organizations by shared patients were built using each method. We found that each algorithm produced networks with substantially different topological properties. Provider networks adhered to a power law, and organization networks to a power law with exponential cutoff. Censoring networks to exclude edges with less than 11 shared patients, a common de-identification practice for healthcare network data, markedly reduced edge numbers and greatly altered measures of vertex prominence such as the betweenness centrality. We identified patterns in the distance patients travel between network providers, and most strikingly between providers in the Northeast United States and Florida. We conclude that the choice of network construction algorithm is critical for healthcare network analysis, and discuss the implications for selecting the algorithm best suited to the type of analysis to be performed.Comment: With links to comprehensive, high resolution figures and networks via figshare.co

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Building-Blocks for Performance Oriented DSLs

Author: Arvind K. Sujeeth
C Rompf
Hassan Chafi
Hyoukjoong Lee
Kevin J. Brown
Kunle Olukotun
Martin Odersky
Tiark Rompf
Publication venue: 'Open Publishing Association'
Publication date: 01/01/2011
Field of study

Domain-specific languages raise the level of abstraction in software development. While it is evident that programmers can more easily reason about very high-level programs, the same holds for compilers only if the compiler has an accurate model of the application domain and the underlying target platform. Since mapping high-level, general-purpose languages to modern, heterogeneous hardware is becoming increasingly difficult, DSLs are an attractive way to capitalize on improved hardware performance, precisely by making the compiler reason on a higher level. Implementing efficient DSL compilers is a daunting task however, and support for building performance-oriented DSLs is urgently needed. To this end, we present the Delite Framework, an extensible toolkit that drastically simplifies building embedded DSLs and compiling DSL programs for execution on heterogeneous hardware. We discuss several building blocks in some detail and present experimental results for the OptiML machine-learning DSL implemented on top of Delite.Comment: In Proceedings DSL 2011, arXiv:1109.032

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

Spoofax at Oracle: Domain-Specific Language Engineering for Large-Scale Graph Analytics

Author: Boucherit Hamza
Boukham Houda
Chafi Hassan
Chiadmi Dalila
Delamare Arnaud
Dwars Martijn
Hartman Toine
Hong Sungpack
van Rest Oskar
Wachsmuth Guido
Publication venue: OASIcs - OpenAccess Series in Informatics. Eelco Visser Commemorative Symposium (EVCS 2023)
Publication date: 01/01/2023
Field of study

For the last decade, teams at Oracle relied on the Spoofax language workbench to develop a family of domain-specific languages for graph analytics in research projects and in product development. In this paper, we analyze the requirements for integrating language processors into large-scale graph analytics toolkits and for the development of these language processors as part of a larger product development process. We discuss how Spoofax helps to meet these requirements and point out the need for future improvements

Dagstuhl Research Online Publication Server

The LDBC Graphalytics Benchmark

Author: Anderson Michael
Boncz Peter
Capotă Mihai
Chafi Hassan
Depner Siegfried
Hegeman Tim
Heldens Stijn
Iosup Alexandru
Manhardt Thomas
Musaafir Ahmed
Nai Lifeng
Ngai Wing Lung
Pérez Arnau Prat
Sundaram Narayanan
Szárnyas Gábor
Tănase Ilie Gabriel
Uta Alexandru
Xia Yinglong
Publication venue
Publication date: 15/02/2023
Field of study

In this document, we describe LDBC Graphalytics, an industrial-grade benchmark for graph analysis platforms. The main goal of Graphalytics is to enable the fair and objective comparison of graph analysis platforms. Due to the diversity of bottlenecks and performance issues such platforms need to address, Graphalytics consists of a set of selected deterministic algorithms for full-graph analysis, standard graph datasets, synthetic dataset generators, and reference output for validation purposes. Its test harness produces deep metrics that quantify multiple kinds of systems scalability, weak and strong, and robustness, such as failures and performance variability. The benchmark also balances comprehensiveness with runtime necessary to obtain the deep metrics. The benchmark comes with open-source software for generating performance data, for validating algorithm results, for monitoring and sharing performance data, and for obtaining the final benchmark result as a standard performance report

arXiv.org e-Print Archive

LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms

Author: Alexandru Iosup
Arnau Prat-Pérez
Gabriel Tȃnase
Hassan Chafi
Michael Anderson
Mihai Capotȃ
Nai ⊕ Peter Boncz
Narayanan Sundaram
Ngai △ Stijn Heldens
Thomas Manhardt
Tim Hegeman
Wing Lung
Yinglong Xia
⊗ Lifeng
⊙ Ilie
Publication venue
Publication date: 06/03/2020
Field of study

ABSTRACT In this paper we introduce LDBC Graphalytics, a new industrial-grade benchmark for graph analysis platforms. It consists of six deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable the objective comparison of graph analysis platforms. Its test harness produces deep metrics that quantify multiple kinds of system scalability, such as horizontal/vertical and weak/strong, and of robustness, such as failures and performance variability. The benchmark comes with open-source software for generating data and monitoring performance. We describe and analyze six implementations of the benchmark (three from the community, three from the industry), providing insights into the strengths and weaknesses of the platforms. Key to our contribution, vendors perform the tuning and benchmarking of their platforms

CiteSeerX

Architectural Semantics for Practical Transactional Memory

Author: Austen McDonald
Blundell C.
Brian D. Carlstrom
Chi Cao Minh
Christos Kozyrakis
E.
Gray J.
Hassan Chafi
J.
J.
JaeWoong Chung
K.
Kunle Olukotun
L.
Luchangco V.
Moss E.
R.
T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref